Cloning genes from streptomyces cyaneogriseus subsp. noncyanogenus  for biosynthesis of antibiotics and methods of use

ABSTRACT

The present invention relates to the complete biosynthetic pathway for the formation of the LL-F28249 compounds and, most importantly, the major component LL-F28249α. The purified and isolated nucleic acid molecule encoding the proteins of the biosynthetic pathway, which is isolated from a wild-type or mutant  Streptomyces , is fully described in FIG.  6  to FIG.  6 - 39  and SEQ ID NO:1. The DNA gene cluster and its expression in a suitable host enable the efficient production of the highly active natural metabolites and semisynthetic derivatives. The invention further concerns plasmids, vectors and host cells that contain and express the novel nucleic acid molecule. Of particular interest, the entire biosynthetic pathway fits compactly in three plasmids, Cos11, Cos36 and Cos40. The invention also concerns the purified and isolated biosynthesis proteins that are encoded by the whole DNA gene cluster. Additionally, the invention involves a new efficient, biochemical method of preparing moxidectin.

CROSS-REFERENCE TO RELATED U.S. APPLICATIONS

This nonprovisional application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 60/471,256, filed on May 16, 2003. The prior application is incorporated herein by reference in its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not Applicable

REFERENCE TO A “SEQUENCE LISTING”

The material on a single compact disc containing a Sequence Listing file provided in this application is incorporated by reference. The date of creation is ______ and the size is approximately ______.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention concerns the novel biosynthetic genes for encoding the proteins responsible for producing the LL-F28249 compounds and the use thereof to make the active metabolites from the fermentation of Streptomyces cyaneogriseus subsp. noncyanogenus. The invention further concerns the genetic manipulation of the biosynthetic pathway to make active semisynthetic derivatives of the natural metabolites.

2. Description of the Related Art

All patents and publications cited in this specification are hereby incorporated by reference in their entirety.

Streptomyces are producers of a wide variety of commercially important secondary metabolites, including the majority of active antibiotics known as the β-lactams and the macrocyclic lactone compounds or macrolides. Because of the commercial importance of the secondary metabolites produced by Streptomyces, there has been considerable recent investment in the development of methods for molecular genetic manipulation of Streptomyces. Procedures have been developed for the introduction of genetic material by polyethylene glycol mediated transformation and by conjugal transfer from Escherichia coli. Vectors have been developed including high and low copy number vectors, integrative vectors, and E. coli-Streptomyces shuttle vectors. These methods for molecular genetic manipulation of Streptomyces have been summarized in D. A. Hopwood et al., Genetic Manipulation of Streptomyces, A Laboratory Manual, John Innes Foundation Press, Norwich, UK (1985). In many cases, the genes for the production of secondary metabolites are clustered in Streptomyces. Thus, the identification of a single gene in a biosynthetic gene cluster may lead to the identification of all of the genes responsible for the biosynthesis of the metabolite. This observation has proven to be tremendously valuable, and secondary metabolite biosynthetic gene clusters have been cloned by reverse genetics, complementation of blocked mutants, resistance and use of heterologous probes. Using these methods, nucleotide and predicted amino acid sequence data have been obtained for many macrolide biosynthetic gene clusters including those directing the synthesis of erythromycin (see S. Donadio et al., Science 252:675-679 (1991) and S. F. Haydock et al., Molecular and General Genetics 230:120-128 (1991)); rapamycin (see T. Schwecke et al., Proceedings of the National Academy of Sciences USA 92:7839-7843 (1995) and X. Ruan et al., Gene 203:1-9 (1997)); FK506 (H. Motamedi and A. Shafiee, European Journal of Biochemistry 256:528-534 (1998)); oleandomycin (D. G. Swan et al., Molecular and General Genetics 242:358-362 (1994)) and rifamycin (see P. R. August et al., Chemistry & Biology 5:69-79 (1998)). However, the complete biosynthetic gene cluster for the macrocyclic lactone compounds known as the LL-F28249 compounds has not yet been described in the art.

There are many reports that molecular genetic manipulations can be used to alter the course of polyketide biosynthesis (see S. Donadio et al., Science 252:675-679 (1991) and S. Donadio et al., Proceedings of the National Academy of Sciences USA 90: 7119-7123 (1993)). In those studies, erythromycin-related lactones were produced following manipulation of the 6-deoxyerythronolide B synthase (“DEBS”) gene cluster (the core polyketide synthase gene cluster responsible for erythromycin biosynthesis) such that either the module 4 enoylreductase or the module 5 ketoreductase domains were nonfunctional. Strains containing these variant DEBS gene clusters produced the expected erythromycin-related lactones. These pioneering studies have since been repeated and expanded upon, and the results of many such studies have been reviewed in the literature (see, for example, L. Katz and S. Donadio, Annual Reviews of Microbiology 47:875-912 (1993); C. R. Hutchinson and I. Fujii, Annual Reviews of Microbiology 49:201-238 (1995); D. A. Hopwood, Chemical Reviews 97:2465-2497 (1997); and C. W. Carreras and D. V. Santi, Current Opinions in Biotechnology 9: 403-411 (1998)).

Data summarized in the literature suggest that the organization of catalytic domains in type I polyketide synthase (“PKS”) modules is conserved, and many highly conserved amino acid sequence motifs have also been described in those biosynthetic gene clusters. For example, the organization of the biosynthetic gene cluster of avermectin, which is produced by S. avermitilis, has been reported (see D. J. MacNeil et al., Gene 115:119-125 (1992) and D. J. MacNeil et al., Annals of the New York Academy of Sciences 721:123-132 (1994)); and partial nucleotide sequences of that biosynthetic gene cluster have been reported or are otherwise available. MacNeil and colleagues have also predicted the modular organization and reported a limited restriction endonuclease map of the wild-type S. cyaneogriseus (NRRL 15773) nemadectin biosynthetic gene cluster (see D. J. MacNeil et al., Annals of the New York Academy of Sciences 721:123-132 (1994)), but their restriction map was incomplete. Their analysis only indicated the presence of nine modular repeats of PKS function and required six overlapping clones to define the 75 kb region of the S. cyaneogriseus genome. MacNeil et al. did not complete the DNA sequencing of the whole biosynthetic gene cluster. Instead, the authors sequenced only the ends of selected cosmids. From the limited sequence information, they could only generate a very sketchy restriction endonuclease map. Further C-13 labeling studies have been conducted, and a mechanism for synthesis of the LL-F28249α compound from its constituent acyl units has been proposed (H. R. Tsou et al., Journal of Antibiotics (Tokyo) 42:398-406 (1989)).

The highly active LL-F28249 compounds, which are natural endectocidal agents widely used for treatment of nematode and arthropod parasites, including the control or prevention of helmintic, arthropod ectoparasitic and acaridal infections, are isolated from the fermentation broth of Streptomyces cyaneogriseus subsp. noncyanogenus (hereinafter referred to as “S. cyaneogriseus”). The series of anti-parasitic LL-F28249 compounds produced from S. cyaneogriseus are structurally similar to, but patentably distinct from, the well-characterized avermectins. U.S. Pat. No. 5,106,994 and its continuation U.S. Pat. No. 5,169,956 describe the preparation of the major and minor components, LL-F28249α-λ. The LL-F28249 family of compounds further includes, but is not limited to, the semisynthetic 23-oxo derivatives and 23-imino derivatives of LL-F28249α-λ, which are shown in U.S. Pat. No. 4,916,154. Moxidectin, chemically known as 23-(O-methyloxime)-LL-F28249α, is a particularly potent 23-imino derivative. Other examples of LL-F28249 derivatives include, but are not limited to, 23-(O-methyloxime)-5-(phenoxyacetoxy)-LL-F28249α, 23-(semicarbazone)-LL-F28249α and 23-(thiosemicarbazone)-LL-F28249α.

One of the major nemadectin metabolites, LL-F28249α (hereinafter referred to as “Fα”), is converted to the commercially important compound moxidectin using a four-step chemical process. The determination of the biosynthetic gene cluster of Fα, heretofore unknown, would be of great commercial significance. Not only would isolation of the gene be highly desirable to make the active Fα compound and other natural members of the LL-F28249 family of compounds, but also to prepare the commercially potent semisynthetic derivatives such as moxidectin more quickly and efficiently.

It is therefore an important object of the present invention to isolate and characterize the entire nucleotide sequence encoding the proteins responsible for producing the LL-F28249 compounds, preferably the LL-F28249α metabolite, and then to isolate and determine the function of the amino acid sequences comprising the biosynthesis proteins.

Another object is to provide a new process for isolating natural and semisynthetic derivatives directly from the fermentation broth of bioengineered strains of Streptomyces cyaneogriseus subsp. noncyanogenus.

A further object is to provide a new method for the preparation of moxidectin in an efficient process with fewer steps than heretofore achievable.

Further purposes and objects of the present invention will appear as the specification proceeds.

The foregoing objects are accomplished by providing a new, purified and isolated nucleic acid molecule that encodes the proteins connected with the entire biosynthetic pathway for producing the LL-F28249 compounds.

BRIEF SUMMARY OF THE INVENTION

The present invention concerns the unique cloning and characterization of the complete biosynthetic pathway for the formation of the LL-F28249 compounds and, most importantly, the highly active, major component LL-F28249α. The full DNA gene cluster and its expression in a suitable host enable the efficient production of the highly active natural metabolites and semisynthetic derivatives. Remarkably, the whole biosynthetic pathway is efficiently contained in only three plasmids identified as Cosmid Numbers 11, 36 and 40 (hereinafter referred to as “Cos11,” “Cos36” and “Cos40,” respectively).

BRIEF DESCRIPTION OF THE DRAWINGS

The background of the invention and its departure from the art will be further described hereinbelow with reference to the accompanying drawings, wherein:

FIG. 1 illustrates the construction of the biosynthetic gene cluster for making the LL-F28249 compounds via the gene segments contained within cosmids made according to the present invention. S. cyaneogriseus cosmid libraries are constructed by ligating Sal3A fragments of S. cyaneogriseus genomic DNA into the BamH1 site of cosmid vector pSuperCos 1. The resultant cosmid libraries are transformed into E. coli VCS257. Various cosmids are identified by hybridization technique using the avermectin ketoacyl synthase probe or by a “walking” technique as described herein. The cosmids are characterized by restriction endonuclease mapping and DNA sequencing. The BamH1 restriction map of the Fα gene cluster is obtained from analyzing overlapping cosmids and confirmed by DNA sequencing. B denotes a BamH1 site.

FIG. 2 illustrates the biosynthesis proteins and their positions encoded by the cloned biosynthetic gene cluster for making the LL-F28249 compounds. A contiguous nucleotide sequence of approximately 88 Kbp containing the entire Fα polyketide synthase gene cluster is obtained by sequencing overlapping cosmids and the subclones thereof. The 13 modules and respective domains are identified using BLAST alignment analysis. Other biosynthetic genes are identified in the same way. The following abbreviations are used in the figure: ACP, acyl carrier protein; DH, dehydratase; ER, enoylreductase; KR, ketoreductase; KS, ketoacyl synthase; LD, loading domain; TE, thioesterase; MT, methyl transferase; AT, acyl transferase.

FIG. 3 shows the structure of the components of the vector designated pKR0.9, which is the 900 bp BstEII-AatII fragment of pNE57 (and contains the desired region of the Fα module 3 ketoreductase domain), in the BstEII-AatII sites of pSL301 (Invitrogen, Carlsbad, Calif.). The following abbreviations are used in the figure: mod3 KR, Fα module 3 ketoreductase domain; amp, the ampicillin resistance marker.

FIG. 4 shows the structure of the plasmid components of the pFDmod3/5.2 series. These plasmids are constructed to combine the site-directed mutations of the Fα module 3 ketoreductase domain with flanking DNA to facilitate homologous integration. The backbone vector is E. coli-Streptomycin shuttle vector pKC1132. The following abbreviations are used in the figure: mod3 KS, module 3 ketoacyl synthase domain; mod3 AT, module 3 acyl transferase; mod3 DH, module 3 dehydratase; mod 3 ER, module 3 enoylreductase; mod3 KR, module 3 ketoreductase domain; apra, apramycin resistance marker.

FIG. 5 shows the structure of the plasmid components of the pFDmod3/4.2 series. These plasmids are derived from the pFDmod3/4.2 series by removing approximately 1 Kbp of flanking DNA to minimize aberrant integration. The following abbreviations are used in the figure: mod3 AT, module 3 acyl transferase; mod3 DH, module 3 dehydratase; mod 3 ER, module 3 enoylreductase; mod3 KR, module 3 ketoreductase domain; apra, apramycin resistance marker.

FIG. 6 to FIG. 6-39 show the full-length nucleotide sequence (88400 bp) of the biosynthetic genes for making the LL-F28249 compounds (which corresponds to SEQ ID NO:1).

FIG. 7 represents the putative amino acid sequence (922 aa) of the regulatory protein encoded by the ORF1 gene (which corresponds to SEQ ID NO:2).

FIG. 8 represents the putative amino acid sequence (259 aa) of the thioesterase protein encoded by the ORF2 gene (which corresponds to SEQ ID NO:3).

FIG. 9 represents the putative amino acid sequence (267 aa) of the reductase protein encoded by the ORF3 gene (which corresponds to SEQ ID NO:4).

FIG. 10 to FIG. 10-1 represent the putative amino acid sequence (2341 aa) of the loading domain protein for Mod1 encoded by the ORF4 gene (which corresponds to SEQ ID NO:5).

FIG. 11 to FIG. 11-2 represent the putative amino acid sequence (3723 aa) of the loading domain protein for Mod2-Mod3 encoded by the ORF5 gene (which corresponds to SEQ ID NO:6).

FIG. 12 to FIG. 12-3 represent the putative amino acid sequence (6043 aa) of the loading domain protein for Mod-4-Mod7 encoded by the ORF6 gene (which corresponds to SEQ ID NO:7).

FIG. 13 represents the putative amino acid sequence (284 aa) of the methyltransferase protein encoded by the ORF7 gene (which corresponds to SEQ ID NO:8).

FIG. 14 represents the putative amino acid sequence (468 aa) of the p450 protein encoded by the ORF8 gene (which corresponds to SEQ ID NO:9).

FIG. 15 to FIG. 15-3 represent the putative amino acid sequence (5674 aa) of the loading domain protein for Mod8-Mod10 encoded by the ORF9 gene (which corresponds to SEQ ID NO:10).

FIG. 16 to FIG. 16-3 represent the putative amino acid sequence (5166 aa) of the loading domain protein for Mod11-Mod13 encoded by the ORF10 gene (which corresponds to SEQ ID NO:11).

FIG. 17 represents the putative amino acid sequence (254 aa) of the oxidoreductase protein encoded by the ORF11 gene (which corresponds to SEQ ID NO:12).

DETAILED DESCRIPTION OF THE INVENTION

In accordance with the present invention, there is provided a novel, purified and isolated nucleic acid molecule encoding the proteins of the entire biosynthetic pathway for producing the LL-F28249 compounds. The nucleic acid molecule of this invention is isolated from an antibiotic-producing wild-type or mutant Streptomyces. Surprisingly, the complete DNA for encoding all of the essential biosynthetic proteins is efficiently packaged in only three cosmids. These three cosmids, Cos11, Cos36 and Cos40, which have been constructed to contain the nucleic acid molecule according to the invention, are sufficient to regenerate the entire biosynthetic pathway for producing the LL-F28249 compounds. Thus, the present invention uniquely provides the entire biosynthetic gene cluster in three cosmids, as a preferred embodiment, which enables a substantially more efficient means for making the active anti-parasitic LL-F28249 compounds, particularly moxidectin, in fewer steps than previously contemplated. The success of this invention has overcome the prior failed attempts by others to isolate the full biosynthetic gene and satisfies a long-standing need.

The nucleotide sequence of this complete DNA gene cluster is fully described in FIG. 6 to FIG. 6-39 (which corresponds to SEQ ID NO:1). The scope of the invention also embraces its complementary strand, that is, those nucleotides that are the complement nucleotides (for example, A substituted for T, C substituted for G and vice versa) and/or reverse nucleotide sequences (i.e., a descending order instead of the forward or ascending strand, for example, changing the direction from reading 5′ to the 3′ end to reading 3′ to the 5′ end).

The present invention further includes the nucleic acid sequence that hybridizes to the sequence of the nucleic acid molecule of SEQ ID NO:1 isolated from the microbial source or its complementary strand and encodes a protein of the biosynthetic pathway for producing the LL-F28249 compounds. Typical hybridization procedures and conditions, which are well known to those of ordinary skill in the art, are illustrated in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). While standard or stringent conditions are employed for homologous probes, less stringent hybridization conditions may be used for partially homologous probes that have less than 100% homology with the target nucleic acid sequence. In the latter case of partially homologous probes, a series of Southern and Northern hybridizations may be readily carried out at different stringencies. For instance, when hybridization is carried out in formamide-containing solvents, preferred conditions employ a temperature and ionic strength at a constant of about 42° C. with a solution containing 6×SSC, 50% formamide strength. Less stringent hybridization conditions may use the same temperature and ionic strength but less or lowering amounts of formamide in the annealing buffer at a range of about 45% to 0%. Alternatively, hybridization may be carried out in aqueous solutions containing no formamide. Usually for aqueous hybridization, the ionic strength of the solution is kept the same, often at about 1 M Na⁺ while the temperature of annealing may be lowered from about 68° C. to 42° C.

In general, the isolation and characterization of the genomic DNA and the cloned, recombinant DNA from suitable host cells may be done via standard or stringent hybridization techniques, utilizing all or a portion of a nucleotide sequence as a probe to screen an appropriate library. As an alternative approach, oligonucleotide primers, which are constructed on the basis of other related, known DNA and protein sequences, can be used in polymerase chain reactions to amplify and identify other identical or related sequences. The nucleotides and proteins described herein are isolated and purified by routine methods to varying degrees. Preferably, the proteins are obtained in substantially pure form but a lower range of about 80% to about 90% pure is acceptable. It is contemplated that the scope of the invention also includes the DNA and proteins that are made by chemical synthesis, which have the same or substantially the same structures as those derived directly from the antibiotic-producing wild-type or mutant Streptomyces and are confirmed by routine testing or standard assays to be involved in the biosynthetic pathway of the LL-F28249 compounds.

Additionally, the invention encompasses and fully describes the isolated biosynthesis proteins comprising the amino acid sequences that include, but are not limited to, the regulatory protein encoded by the ORF1 gene (which corresponds to SEQ ID NO:2), the thioesterase protein encoded by the ORF2 gene (which corresponds to SEQ ID NO:3), the reductase protein encoded by the ORF3 gene (which corresponds to SEQ ID NO:4), the loading domain protein for Mod1 encoded by the ORF4 gene (which corresponds to SEQ ID NO:5), the loading domain protein for Mod2-Mod3 encoded by the ORF5 gene (which corresponds to SEQ ID NO:6), the loading domain protein for Mod-4-Mod7 encoded by the ORF6 gene (which corresponds to SEQ ID NO:7), the methyltransferase protein encoded by the ORF7 gene (which corresponds to SEQ ID NO:8), the p450 protein encoded by the ORF8 gene (which corresponds to SEQ ID NO:9), the loading domain protein for Mod8-Mod10 encoded by the ORF9 gene (which corresponds to SEQ ID NO:10), the loading domain protein for Mod11-Mod13 encoded by the ORF10 gene (which corresponds to SEQ ID NO:11) and the oxidoreductase protein encoded by the ORF11 gene (which corresponds to SEQ ID NO:12).

The open reading frames of the genomic DNA cluster, which encode the biosynthesis proteins, may be identified using a variety of art-recognized techniques. The techniques include, but are not limited to, computer analysis to locate known start and stop codons, putative reading frame locations based on codon frequencies, similarity alignments to expressed genes in other known Streptomyces strains and the like. In this fashion, the proteins of the invention are identified using the nucleotide sequence of the present invention and the open reading frames or the encoded proteins may then be isolated and purified or, alternatively, synthesized by chemical means. Expressible genetic constructs based on the open reading frames and appropriate promoters, initiators, terminators and the like may be designed and introduced into a suitable host cell to express the protein encoded by the open reading frame.

As used herein, the term “proteins” means the polypeptides, the enzymes and the like, as those terms are commonly used in the art, which are encoded by the nucleic acid molecule comprising the biosynthetic pathway for producing the LL-F28249 compounds. The proteins of the invention encompass amino acid chains of varying length, including full-length, wherein the amino acid residues are linked by covalent peptide bonds, as well as the biologically active variants thereof. The proteins may be natural, recombinant or synthetic. For example, the biosynthesis proteins may be made through conventional recombinant technology by inserting a nucleotide sequence that encodes the protein into an appropriate expression vector and expressing the protein in a suitable host cell or through standard chemical synthesis by the Merrifield solid-phase synthesis method described in Merrifield, J. Am. Chem. Soc. 85:2149-2154 (1963), in which the amino acids are individually and sequentially attached to an amino acid chain. Alternatively, modern equipment is commercially available from a variety of manufacturers such as Perkin-Elmer, Inc. (Wellesley, Mass.) for the automated synthesis of proteins.

The biologically active variants that are included within the scope of the present invention comprise, at a minimum, the biologically functional portion of the amino acid sequence encoded by the nucleic acid molecule of the invention. As used herein, the “biologically functional portion” is that part of the protein structure which still retains the active function of the protein, for example, that part of the regulatory protein molecule encoded by the ORF1 gene which has the same or substantially the same activity and/or binding properties, i.e., at least about 90%, and more preferably, about 95%, similarities or potencies. The biologically active variants of the proteins include active amino acid structures having deleted, substituted or added amino acid residues, naturally occurring alleles, etc. The biologically functional portion may be easily identified by subjecting the full-length protein to chemical or enzymatic digestion to prepare fragments and then testing those fragments in standard assays to analyze which part of the amino acid structure retains the same or substantially the same biological activity as the full-length protein.

The determination of the full biosynthesis gene cluster of Fα, heretofore unknown, is of great commercial significance. The isolation and complete description of the gene according to the present methods permit the enhanced production of the active Fα compound and other natural members of the LL-F28249 family of compounds. Furthermore, the information about the gene enables an improved method for preparing the commercially potent semisynthetic derivatives such as moxidectin in a more quick and efficient fashion than the prior chemical process of manufacture. As a direct and beneficial consequence of the cloning and characterization the novel Fα biosynthesis gene cluster, which is described herein, unique processes for the direct fermentative production of moxidectin and other important LL-F28249 derivatives using bioengineered strains of S. cyaneogriseus are now obtainable.

One advantage of the present invention is the ability to enhance the production of the highly active Fα from the fermentation broth of S. cyaneogriseus. Cos11 contains a putative transcription activator gene (ORF1) for the PKS cluster. Increasing the expression level of the activator can result in a higher yield of Fα. This is achieved by increasing the copy number of the gene or by enhancing the regulatory sequence elements for this gene according to known techniques (see, for example, Perez-Llarena et al., Journal of Bacteriology 179:2053-2059 (1997)).

Another benefit derived from obtaining the full biosynthetic gene cluster of the present invention is to enable the efficient fermentative production and manufacture of the natural and semisynthetic derivatives of the LL-F28249 family of compounds such as, for example, LL-F28249α, LL-F28249β, LL-F28249γ, 23-(O-methyloxime)-LL-F28249α (moxidectin), 23-(O-methyloxime)-5-(phenoxyacetoxy)-LL-F28249α, 23-(semicarbazone)-LL-F28249α, 23-(thiosemicarbazone)-LL-F28249α, etc. Through the identification of the biosynthesis genes encoding the proteins responsible for the production of the LL-F28249 compounds and, desirably, the Fα metabolite as the major product, additional cloning and mutagenesis of the pathway readily produces other metabolites as by-products of the fermentation process. The biosynthesis genes are particularly useful to minimize the number of chemical reaction steps in preparing other semisynthetic members of the family.

The highly preferred utility of this invention involves the preparation of the commercially important compound moxidectin in fewer steps than previously done via known chemical processes. Moxidectin is currently produced by a four-step chemical process from Fα, which is first obtained by fermentation of Streptomyces cyaneogriseus subsp. noncyanogenus. The conversion of the natural metabolite Fα to moxidectin involves the following chemical reactions: (1) protection of the 5-hydroxyl group; (2) oxidation of the 23-hydroxyl group to a keto function; (3) conversion of the 23-keto to 23-O-methyloxime group; and (4) deprotection of the 5-hydroxyl group. The efficient method of the present invention now permits the chemical conversion of 23-keto Fα to moxidectin to be accomplished in a single step.

By generating mutants of the biosynthesis gene cluster, the specific activity responsible for reduction of the keto function at position 23 of the LL-F28249 compound structure is eliminated and the chemical synthesis is reduced to the one step. Surprisingly, the remainder of the modular polyketide synthase remains functional and the functional remainder of the polyketide synthase recognizes the unnatural polyketide intermediate. The unique bioengineered strain is then capable of being used, cloned and re-used for the direct fermentative production of 23-keto Fα further reducing the normal processing time.

In the below examples, selective mutagenesis illustrates how to modify Fα biosynthesis and to obtain the desired metabolites according to the present methods. Basically, mutants of the module 3 ketoreductase domain of the S. cyaneogriseus Fα biosynthetic gene cluster are generated by site-directed mutagenesis. These ketoreductase variants are designed by comparing the predicted amino acid sequence of the Fα module 3 ketoreductase domain to ketoreductase domains from a number of biologically active ketoreductase domains and several “cryptic” ketoreductase domains. The module 3 ketoreductase domain of the S. cyaneogriseus Fα biosynthetic gene cluster is then replaced with these variant domains by homologous recombination in order to alter Fα biosynthesis and obtain the desired metabolite.

Generally speaking, the site-directed mutagenesis introduces a small deletion or point mutation in the 23-keto (oxo) reductase gene (23-KR gene) to render the 23-ketoreductase domain nonfunctional while it retains the functions of other domains of the polyketide synthase. Mutations in the 23-KR gene are introduced by standard methods into a wild-type Streptomyces cyaneogriseus subsp. noncyanogenus strain or the mutant Fα production strain 142, resulting in the direct fermentative production of 23-keto (oxo) Fα. In addition, the whole Fα PKS gene cluster carrying mutations in the 23-KR gene may be introduced into a suitable host cell such as S. lividans, S. coelicolor, E. coli and the like to produce 23-keto Fα. The transformed host cells are used as the source of DNA for conjugal transfer to S. cyaneogriseus using methods described herein for the further fermentative production of 23-keto Fα.

The imino derivatives (23-oxime) of the 23-oxo compounds are then readily prepared by standard techniques such as procedures described by S. M. McElvain in The Characterization of Organic Compounds, published by MacMillian Company, New York, 1953, pages 204-205 and incorporated herein by reference. Typically, the 23-oxo compound is stirred in alcohol, such as methanol or ethanol, or dioxane in the presence of acetic acid and an excess of the amino derivatizing agent, such as hydroxylamine hydrochloride, O-methylhydroxylamine hydrochloride, semicarbazide hydrochloride and the like along with an equivalent amount of sodium acetate, at room temperature to about 50° C. The reaction is usually complete in several hours to several days at room temperature but can be readily speeded by heating. This subsequent conversion to moxidection via the 23-keto Fα compound is surprisingly and beneficially the only necessary chemical reaction to take place.

It is further contemplated that the genetic material contained within the three cosmids, Cos11, Cos36 and Cos40, may be reduced to fit into two plasmids or a single plasmid through genetic manipulations known to those of ordinary skill in the art. For example, the cloned Fα biosynthesis genes that are present in the Cos11, Cos36 and Cos40 prepared according to the methods of the present invention would be used to assemble the entire polyketide synthase (PKS) gene cluster on two plasmids or a single plasmid. The assembling can be achieved by use of cloning, PCR or synthetic genes, or a combination of any of these art-recognized techniques. The assembled Fα PKS gene cluster can be introduced into a suitable host cell such as S. lividans, S. coelicolor, E. coli and the like to produce Fα. Thereafter, the assembled PKS gene cluster can be used in a cell-free expression system such as, for example, a cell-free expression system described by Olsthoorn-Tieleman et al., Eur. J. Biochem. 268:3807-3815 (2001), to produce further amounts of Fα and related products.

Using the modular organization of the core LL-F28249α polyketide synthase and the functional domains within those modules, the biosynthesis gene cluster described herein is cloned and fully characterized. Generally, for the isolation of the biosynthetic genes, a cosmid library of S. cyaneogriseus genomic DNA is prepared in the commercially available vector pSuperCos (Stratagene, La Jolla, Calif.). This cosmid library is probed with fragments of DNA corresponding to the avermectin module 1 ketoacyl synthase, which has been amplified from S. avermitilis genomic DNA using the polymerase chain reaction. Subsequently, several regions of the Fα biosynthetic gene cluster, which have been amplified from previously characterized cosmids using the polymerase chain reaction, are used as probes to isolate additional cosmids. Using these methods, a series of cosmids are isolated that collectively span over 100 Kbp of genomic DNA. Complete restriction endonuclease mapping and thorough nucleotide sequence analysis identify the cosmids and result in a definitive, unambiguous contiguous nucleotide sequence spanning nearly 88 Kbp. Analysis of this nucleotide sequence reveals the presence of 13 complete modules of a modular polyketide synthase together with at least six additional genes involved in the biosynthesis or in the regulation of the biosynthesis of Fα.

The invention further embraces biologically functional plasmids or vectors containing the nucleic acid molecule of the present invention. The particular plasmids of the invention are selected for their ability to incorporate large DNA gene clusters but they are conventional and are derived from commonly available vectors, for example, pKR0.9, the pFDmod3/5.2 series, the pFDmod3/4.2 series and the like.

Although E. coli is used as the heterologous host in the examples, the heterologous expression of antibiotic biosynthetic genes is expected in a wide number of Actinomycetales, Bacillus, Corynebacteria, Thermoactinomyces and the like so long as they are capable of being transformed with the relatively large plasmid constructs described herein. Those that are transformed include, but are not limited to, Streptomyces lividans, Streptomyces coelicolor, Streptomyces griseofuscus and Streptomyces ambofaciens, which are known to be relatively non-restricting. Preferably, the suitable host cell that is stably transformed or transfected by the plasmid or vector is Streptomyces coelicolor or an Escherichia coli-Streptomyces cosmid vehicle. In vitro expression of the proteins may be performed, if desirable, using standard art methods.

The following section highlights general methods and materials, available to those of ordinary skill in the art, which have been used to successfully clone and characterize the entire, large biosynthetic pathway of the present invention.

General Methods and Materials A. Materials, Plasmids and Bacterial Strains

An E. coli-Streptomyces shuttle vector that contains elements required for replication and selection in E. coli and in Streptomyces, including antibiotic resistance markers for selection with apramycin, pKC1132, is used throughout this work (see M. Bierman et al., Gene 116:43-49 (1992)). In addition to pKC1132, commercially available cloning vectors are used as indicated herein. Those of ordinary skill in the art will be able to select other well known cloning vectors, which can readily be substituted for the exemplified vectors, and avoid or minimize instability problems encountered with certain older strains of the cosmid-harboring E. coli using standard techniques.

Plasmid DNA is manipulated using procedures similar to those established by work on other plasmids. Typical procedures are presented in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). Typical procedures for Streptomyces are presented in D. A. Hopwood et al., Genetic Manipulation of Streptomyces, A Laboratory Manual, John Innes Foundation Press, Norwich, UK (1985). Specific methods used in this work are described herein unless they are identical to methods presented in the above-referenced laboratory manuals.

E. coli JM109 and DH5α, common laboratory strains used throughout this work, are readily available from a number of commercial sources (for example, Stratagene, La Jolla, Calif.). E. coli XL1-Blue MRF′ strain is obtained from Stratagene (La Jolla, Calif.). E. coli ETS12567 (pUZ8002) is obtained from Professor Heinz Floss, of the Department of Chemistry, University of Washington (Seattle, Wash.). E. coli VCS257 is obtained from Stratagene (La Jolla, Calif.). S. avermitilis is obtained from the American Type Culture Collection under ATCC Deposit Accession No. 31,267 but it can also be obtained from the Agricultural Research Culture Collection (NRRL), 1815 N. University Street, Peoria, Ill. 61604, under NRRL 8165. “Wild-type” Streptomyces cyaneogriseus subsp. noncyanogenus LL-F28249 (NRRL 15773) and the mutant Fα production strain of S. cyaneogriseus designated “S. cyaneogriseus strain 142” are used separately throughout this written disclosure of the present invention but they are interchangeable and may substitute for each other in any given step of the disclosed process. Strain 142, which is derived from the wild-type strain, has undergone classic genetic manipulations to enhance antibiotic production but it retains the same polyketide synthase DNA sequence as the wild-type strain. Because their polyketide synthase sequences are identical, all of the plasmids described herein, including but not limited to Cos11, Cos36 and Cos40, can be derived from wild-type Streptomyces cyaneogriseus subsp. noncyanogenus or S. cyaneogriseus strain 142 with the same result.

B. Restriction Analysis of Plasmid DNA

Procedures for restriction analysis of plasmid DNA, procedures for agarose gel electrophoresis, and other standard techniques of recombinant DNA technology are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). Plasmid DNA is digested with restriction endonucleases according to the manufacturer's procedures. Enzymes are obtained from New England Biolabs (Beverly, Mass.), Life Technologies (Rockville, Md.) or Promega (Madison, Wis.). Restriction digests are analyzed by electrophoresis in 0.8% w/v agarose using 40 mM tris-acetate, 1 mM EDTA as a buffer. The size of the fragments is determined by comparison to DNA fragments of known molecular weight (1 Kb ladder, Life Technologies, Rockville, Md.).

C. Preparation of Hybridization Probes

Hybridization probes are isolated from plasmids following restriction digestion or are generated using the polymerase chain reaction as described herein. Probes are radiolabeled to high specific radioactivity using EasyTides™ α³² P-dCTP (3000 Ci/mmol) from New England Nuclear (Boston, Mass.) and the Rediprime™ II random prime labeling system from Amersham Pharmacia Biotech (Piscataway, N.J.) according to procedures provided by the manufacturer.

Hybridization probes are used to identify cosmids containing the Fα biosynthetic gene cluster (from both S. cyaneogriseus strain 142 and wild-type S. cyaneogriseus cosmid libraries), to confirm and characterize transconjugants and excisants, and to facilitate the generation of accurate restriction maps of the Fα biosynthetic gene cluster that confirm the identity of the gene. These hybridization probes are either generated by PCR amplification or the probes are excised from clones as summarized in the following Table 1.

TABLE 1 PCR Primer Sequence or Restriction Probe Sites Use Avermectin F: GCCGAATTCCTTCGGCATCAGCCCC To Isolate Cosmids Containing the Fα Biosynthetic KS1 R: GCTCGCACCGTCCTGGTTGACCGC Gene Cluster (S. cyaneogriseus strain 142) NE5.7 5.7 Kbp NotI/EcoRI Fragment of Cos7 To Isolate Cosmids Containing the Fα Biosynthetic Gene Cluster (wild-type S. cyaneogriseus) (Contains Fa Module 3) Apramycin 750 bp SacI Fragment of pKC1132 To Confirm and Characterize Transconjugants Mod3 F: GACAACGTCGGTCCGG To Confirm and Characterize Transconjugants, and in R: CGCGGTGACTCGCTTGAGGTATTC Restriction Mapping Thioesterase F: GCTTCACCGACCCCTCGGCTATGACC To Restriction Map the Right End of the Fα R: GTGAAGTGGTTGCCGTCGGTTTCGAGG Biosynthetic Gene Cluster p450 F: GATGACGTGCTCACCGATGTCGGTGAGC To Restriction Map the Right End of the Fα R: GACGTGGAAATCATGTACAGCTCGTACG Biosynthetic Gene Cluster Cos36 (end) 500 bp NotI Fragment of Cos36 To Restriction Map the Right End of the Fα Biosynthetic Gene Cluster Cos12 (end) 1.1 Kbp BamHI/EcoRI Fragment of Cos12 To Restriction Map the Left End of the Fα Biosynthetic Gene Cluster B5.5 5.5 Kbp BamHI Fragment of Cos11 To Restriction Map the Left End of the Fα Biosynthetic Gene Cluster, and To Isolate Cosmids Containing the Fa Biosynthetic Gene Cluster (wild- type S. cyaneogriseus)

Isolation, Maintenance and Propagation of Plasmids A. Plasmid Isolation

E. coli strains, both untransformed and those transformed with vectors as described herein, are grown using well-established methods similar to those described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989).

Plasmid DNA is isolated from E. coli cultures using reagents and materials obtained from QIAGEN (Valencia, Calif.). Depending on the numbers of strains being analyzed, the miniprep plasmid isolation systems used included the QIAprep® Spin Miniprep Kits (for plasmid isolation from relatively small numbers of strains); the QIAprep® 8 Turbo Miniprep Kits (for higher-throughput plasmid isolation from somewhat larger numbers of strains); or the QIAprep® 96 Turbo Miniprep Kits (for partially automated isolation of plasmids from strains in 96-well blocks). For the isolation of larger quantities of plasmid DNA from E. coli, reagents and materials included in the QIAGEN Plasmid Midi (up to 100 μg) and Maxi (up to 500 μg) kits, or reagents and materials included in the Nucleobond AX-100 (up to 100 μg) kit from Clontech (Palo Alto, Calif.) are used.

B. Transformation of Escherichia coli by Plasmid DNA

Plasmid DNA is transformed into electrocompetent E. coli strains by electroporation or into chemically competent E. coli strains by heat shock using well-established procedures similar to those described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). Transformants are selected using appropriate antibiotics, and after plasmids are isolated using methods described herein, they are characterized following digestion with restriction endonucleases, again using well-established methods described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989).

C. Conjugal Transfer of Plasmid DNA from Escherichia coli to Streptomyces cyaneogriseus

In all cases, the plasmids of interest are first transformed into the E. coli strain designated ETS12567 (pUZ8002) by electroporation as described herein. This strain is cm^(r), tet^(r), dam⁻, and dcm⁻¹. Additionally, pUZ8002, which is an oriT⁻ version of the plasmid pRK2 (see R. Meyer et al., Science 190:1226-1228 (1975)), confers kan^(r). The transformed cells are maintained in the presence of appropriate antibiotic selection, including 5 μg/ml kanamycin and 100 μg/ml apramycin. The conjugal transfer of plasmid DNA from these E. coli transformants to S. cyaneogriseus is accomplished using the following procedures, both of which are modified from a procedure described by M. Bierman et al., Gene 116: 43-49 (1992).

-   -   Conjugation Method #1: A 3 ml LB media supplemented with 5 μg/ml         kanamycin, 5 μg/ml chloramphenicol, 50 μg/ml apramycin is         inoculated with a single well-isolated transformed E. coli         colony, and the culture is incubated at 37° C., with shaking at         220 rpm, for 16 hours. 10 ml TSB (27.5 g/L tryptic soy broth, 5         g/L yeast extract, 5 g/L KH₂PO₄, pH 7.0, 100 ml/L of a sterile         solution of 20% (w/v) glucose added after autoclaving) media is         inoculated with 100 μl of a frozen stock of S. cyaneogriseus         mycelial fragments, and the culture is incubated at 31° C., with         shaking at 220 rpm, for 16 hours. The next day, 10 ml LB media         supplemented with 50 μg/ml apramycin is inoculated with a 100 μl         aliquot of the overnight E. coli culture. At the same time, a 2         ml aliquot of the S. cyaneogriseus overnight culture is vortexed         in a tube containing sterile glass beads for 2 minutes. The         suspension is sonicated (3×, 5 second bursts at 100% output);         and 1 ml of this suspension of mycelial fragments is transferred         to 9 ml of TSB (27.5 g/L tryptic soy broth, 5 g/L yeast extract,         5 g/L KH₂PO₄, pH 7.0, 100 ml/L of a sterile solution of 20%         (w/v) glucose added after autoclaving). Both cultures are         incubated at 37° C., with shaking at 220 rpm, until the         absorbance at 600 nm of the E. coli culture reached 0.4-0.6. The         cells in each culture are collected by centrifugation, washed 2×         with LB, and suspended in 500 μl 2XYT (16 g/L tryptone, 10 g/L         yeast extract, 5 g/L NaCl, pH 7.0). Aliquots (100 μl) of the two         preparations are combined; the mixture is incubated at 50° C.         for 5 minutes; and the cells are collected by centrifugation.         The supernatant is removed, and the cell pellet is suspended in         100 ml of 2XYT (16 g/L tryptone, 10 g/L yeast extract, 5 g/L         NaCl, pH 7.0), and plated onto SFM (25 g/L soybean flour         nutrisoy, 25 g/L mannitol, 20 g/L agar, 0.462 g/L L-cysteine,         0.462 g/L L-arginine, 0.462 g/L L-proline) plates. These plates         are incubated at 37° C. for 16 hours, and then overlaid with 1         ml of sterile water containing 0.5 mg of nalidixic acid and 1 mg         of apramycin (final concentrations 20 μg/ml and 40 μg/ml,         respectively). The plates are incubated at 37° C. until colonies         are well established.     -   Conjugation Method #2: 3 ml LB media supplemented with 5 μg/ml         kanamycin, 5 μg/ml chloramphenicol, 100 μg/ml apramycin is         inoculated with a single well-isolated transformed E. coli         colony, and the culture is incubated at 37° C., with shaking at         220 rpm, for 16 hours. 25 ml KB3 medium (10 g/L Bacto-tryptone,         5 g/L yeast extract, 3 g/L beef extract, 1 g/L KH₂PO₄, 1 g/L         K₂HPO₄, 1.5 g/L Difco agar, pH 6.8, and 0.5 ml/L of a trace         metal solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L         MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O) is inoculated with         1 ml of a frozen stock of S. cyaneogriseus, and the culture is         incubated at 31° C., with shaking at 220 rpm, for 16 hours. The         next day, 1 ml of the overnight E. coli culture is combined with         9 ml of LB supplemented with 50 μg/ml apramycin. At the same         time, a 5 ml aliquot of the S. cyaneogriseus overnight culture         is vortexed in a tube containing sterile glass beads for 2         minutes. A 2.5 ml aliquot of the homogenized culture is         inoculated into 25 ml of KB3 medium (10 g/L Bacto-tryptone, 5         g/L yeast extract, 3 g/L beef extract, 1 g/L KH₂PO₄, 1 g/L         K₂HPO₄, 1.5 g/L Difco agar, pH 6.8 and 0.5 ml/L of a trace metal         solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L         MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O), and both cultures         are incubated at 37° C. for 3 hours. The cells in each culture         are collected by centrifugation, and washed 2× with water.         The E. coli and S. cyaneogriseus cell pellets are suspended in 1         ml and 2 ml, respectively, of TSB (27.5 g/L tryptic soy broth, 5         g/L yeast extract, 5 g/L KH₂PO₄, pH 7.0, 100 ml/L of a sterile         solution of 20% (w/v) glucose added after autoclaving). 10 μl of         the S. cyaneogriseus suspension, and 100 μl of the E. coli         suspension are combined with 890 μl of TSB (27.5 g/L tryptic soy         broth, 5 g/L yeast extract, 5 g/L KH₂PO₄, pH 7.0, 100 ml/L of a         sterile solution of 20% (w/v) glucose added after autoclaving),         and 100 μl of the mixture is plated onto AS-1 plates (1 g/L         yeast extract, 0.2 g/L L-alanine, 0.2 g/L L-arginine, 0.5 g/L         L-asparagine, 5 g/L soluble starch, 2.5 g/L NaCl, 10 g/L Na₂SO₄,         20 g/L agar, pH 7.5) supplemented with 10 mM MgCl₂. These plates         are incubated at 37° C. for 16 hours, and then overlaid with 3         ml of R2 agar (100 g/L sucrose, 10 g/L glucose, 10 g/L MgCl₂,         0.25 g/L K₂SO₄, 0.1 g/L casamino acids, 25 g/L agar). At use,         the following solutions are added to each 80 ml flask of R2         agar: 1 ml of 0.5% K₂HPO₄; 8 ml of 3.68% CaCl₂.2H₂O; 1.5 ml of         20% L-proline; 10 ml of 5.73% TES, pH 7.2; 0.5 ml of 1N NaOH;         and 1 ml of a trace elements solution containing 40 mg/L ZnCl₂,         200 mg/L FeCl₃.6H₂O, 10 mg/L CuCl₂.2H₂O, 10 mg/L MnCl₂.4H₂O, 10         mg/L Na₂B₄O₇.10H₂O, 10 mg/L (NH₄)₆Mo₇O₂₄.4H₂O). The solution is         also supplemented to 100 μg/ml apramycin and 100 μg/ml nalidixic         acid (final concentrations). The plates are incubated at 37° C.         until colonies are well established.

Using either method, putative transconjugants are repetitively picked onto fresh plates, in the presence of 100 μg/ml apramycin and 100 μg/ml nalidixic acid until cured of visible contamination by the E. coli strain used as the source of the plasmid.

The purified DNA derived from Streptomyces cyaneogriseus subsp. noncyanogenus, which encodes the entire biosynthetic pathway for the production of the LL-F28249 compounds, has been deposited in connection with the present patent application under the conditions mandated by 37 C.F.R. § 1.808 and maintained pursuant to the Budapest Treaty in the American Type Culture Collection (ATCC), 10801 University Boulevard, Manassas, Va. 20110-2209, U.S.A. More specifically, the purified cosmid DNA, described herein fully and identified as Cos11, Cos36 and Cos40, was deposited in the ATCC on May 24, 2002 and assigned ATCC Patent Deposit Designation Numbers PTA-4392, PTA-4393 and PTA-4394, respectively. It should be appreciated that related purified DNA, other cosmids or plasmids containing related nucleotide sequences, which may be readily constructed using site-directed mutagenesis and the techniques described herein, are also encompassed within the scope of the present invention.

The following examples demonstrate certain aspects of the present invention. However, it is to be understood that these examples are for illustration only and do not purport to be wholly definitive as to conditions and scope of this invention. It should be appreciated that when typical reaction conditions (e.g., temperature, reaction times, etc.) have been given, the conditions both above and below the specified ranges can also be used, though generally less conveniently. The examples are conducted at room temperature (about 23° C. to about 28° C.) and at atmospheric pressure. All parts and percents referred to herein are on a weight basis and all temperatures are expressed in degrees centigrade unless otherwise specified.

A further understanding of the invention may be obtained from the non-limiting examples that follow below.

Example 1 Characterization of the Biosynthetic Gene Cluster for Making LL-F28249 Compounds A. Isolation and Characterization of Cosmids Containing the Fα Biosynthetic Gene Cluster

1. Construction of Streptomyces cyaneogriseus Cosmid Libraries

Genomic DNA was isolated from S. cyaneogriseus (both wild-type and the Fα production strain designated 142) using a method presented in D. A. Hopwood et al., Genetic Manipulation of Streptomyces, A Laboratory Manual, John Innes Foundation Press, Norwich, UK (1985) (“Isolation of Streptomyces “Total” DNA: Procedure 3). The S. cyaneogriseus genomic DNA preparation was subjected to partial restriction endonuclease digestion with Sau3AI as follows. A reaction mixture was prepared containing Sau3AI and genomic DNA, and at time points (0, 5, 10, 15, 20, 30, and 45 minutes) aliquots were removed and the reactions were quenched by the addition of EDTA to a final concentration of 10 mM. A portion of each quenched reaction time point was resolved by electrophoresis through 0.3% w/v agarose at 25 volts for 16 hours. The reaction time point containing DNA fragments that were predominantly between 23 Kbp and 50 Kbp was selected for the cosmid library. At the same time, pSuperCos 1 (Stratagene, La Jolla, Calif.) was digested with the restriction endonuclease XbaI; dephosphorylated using calf intestine alkaline phosphatase; and after ethanol precipitation, the linear vector was digested with the restriction endonuclease BamHI in order to remove one of the Cos sites. The Sau3AI fragments of S. cyaneogriseus genomic DNA were ligated into linearized, BamHI treated pSuperCos 1 according to procedures provided by the manufacturer. The resultant recombinant cosmid DNA preparation was packaged using Gigapack® III XL Packaging Extract, and after lysis of the resultant lambda phage particles with chloroform, the cosmid DNA library was transformed into E. coli VCS257. These manipulations were all conducted using reagents, materials, and procedures provided by the manufacturer (Stratagene, La Jolla, Calif.).

2. Isolation of Cosmids Containing the Fα Biosynthetic Gene Cluster

Genomic DNA was isolated from S. avermitilis using a method presented in D. A. Hopwood et al., Genetic Manipulation of Streptomyces, A Laboratory Manual, John Innes Foundation Press, Norwich, UK (1985) (“Isolation of Streptomyces “Total” DNA: Procedure 3). This genomic DNA preparation was used as a template for amplification of a region of the module 1 ketoacyl synthase domain of the avermectin biosynthetic gene cluster using the polymerase chain reaction. The oligonucleotide primers used were designed on the basis of nucleotide sequences of the avermectin biosynthetic gene cluster that have been deposited into public databases. Colony lifts of the S. cyaneogriseus strain 142 cosmid library were screened for hybridization to the avermectin ketoacyl synthase probe, and more than 30 cosmids potentially containing type I polyketide synthase DNA were isolated. Initially, these cosmids were analyzed following digestion with BamHI, by agarose gel electrophoresis, by Southern blot using the avermectin module 1 ketoacyl synthase probe, and by limited nucleotide sequence analysis. Comparison of these data to data reported by MacNeil and colleagues (see D. J. MacNeil et al., Gene 115:119-125 (1992) and D. J. MacNeil et al., Annals of the New York Academy of Sciences 721:123-132 (1994)) suggested that two of these cosmids (designated Cos7 and Cos11) appeared to span the majority of the Fα biosynthetic gene cluster. The limited data presented by MacNeil and his colleagues were also used as the initial basis to support the isolation of a 5.7 Kbp NotI-EcoRI fragment that included most of module 3. A clone of this 5.7 Kbp NotI-EcoRI fragment was prepared (designated pNE57). The nucleotide sequence of this 5.7 Kbp fragment was determined in its entirety. This fragment of the Fα biosynthetic gene cluster (from genomic DNA isolated from the Fα production strain) was then used as a probe to screen the wild-type S. cyaneogriseus cosmid library and 45 cosmids potentially containing type I polyketide synthase DNA were isolated. These cosmids were extensively mapped with BamHI, NotI, and EcoRI using methods described herein, and on the basis of comparison of those restriction maps to the incomplete data presented by MacNeil and his colleagues, two cosmids (designated Cos36 and Cos40 from the wild-type strain), that appeared to span the majority of the Fα biosynthetic gene cluster, were identified.

In order to identify cosmids spanning the “ends” of the Fα biosynthetic gene cluster, but not containing significant stretches of core polyketide synthase DNA, the following strategy was employed. A 5.5 Kbp BamHI fragment isolated from Cos11 (from S. cyaneogriseus strain 142) was used to reprobe the wild-type S. cyaneogriseus cosmids that had been selected previously in order to identify additional cosmids that would extend the cluster to the “left.” A number of cosmids were identified that hybridized to the probe, and after restriction mapping, one of these, Cos14, was identified that would support extending the cluster the furthest to the left. A 500 bp NotI fragment isolated from the 3′ end of Cos36 was used to reprobe the wild-type S. cyaneogriseus cosmid library in order to identify additional cosmids that would extend the cluster to the “right.” A number of additional cosmids were identified that hybridized to the probe, and after restriction mapping, one of these, Cos50, was identified that would support extending the cluster the furthest to the “right.”

3. Restriction Mapping Cosmids Containing the Fα Biosynthetic Gene Cluster

Initially, more than 30 cosmids from the S. cyaneogriseus strain 142 cosmid library that hybridized to the avermectin ketoacyl synthase probe, and 45 cosmids from the wild-type S. cyaneogriseus cosmid library that hybridized to the Fα module 3 probe (pNE57), were mapped following digestion with BamHI, NotI, and EcoRI. On the basis of this preliminary analysis, and on the basis of comparison of the restriction maps to the incomplete data presented by MacNeil and his colleagues (see D. J. MacNeil et al., Gene 115:119-125 (1992) and D. J. MacNeil et al., Annals of the New York Academy of Sciences 721:123-132 (1994)), several cosmids were selected for more comprehensive analysis. These cosmids (designated Cos7 and Cos11 from S. cyaneogriseus strain 142; and Cos12, Cos14, Cos36, Cos40 and Cos50 from wild-type S. cyaneogriseus) were carefully mapped following digestion with BamHI, NotI, and EcoRI and double-digestion with BamHI/MluI, NotI/EcoRI, BamHI/EcoRI, SacI/EcoRI, and NotI/MluI. To resolve ambiguity in the restriction maps that were observed, subclones of these cosmids were constructed as summarized in the following Table 2, and these subclones were extensively mapped as described above.

TABLE 2 Subcloned Designation from: Vector Restriction Sites/Size pB5.5 Cos11 pZeroBlunt BamHI/5.5 Kbp pB18.0 Cos11 pUC19 BamHI/ 18.0 Kbp PBE15.0 Cos12 pBluescript KS BamHI/EcoRI/15.0 Kbp pB2.5 Cos14 pBluescript KS BamHI/2.5 Kbp pB5.5 Cos14 PZeroBlunt BamHI/5.5 Kbp PBB14.0 Cos14 pBluescript KS BamHI/Bg/II/14.0 Kbp PM14.0 Cos14 pLitmus38 MluI/14.0 Kbp PN2.0 Cos14 pBluescript KS NotI/2.0 Kbp PN4.3 Cos14 pBluescript KS NotI/4.3 Kbp pS1.45 Cos14 pBluescript KS SacI/1.45 Kbp pS8.2 Cos14 pBluescript KS SacI/8.2 Kbp pS2.0 Cos14 pLitmus38 SphI/2.0 pB11.5 Cos36 pBluescript KS BamHI/11.5 Kbp PBE4.8 Cos36 pBluescript KS BamHI/EcoRI/4.8 Kbp PM4.6 Cos36 pLitmus38 MluI 4.6 Kbp PN1.6 Cos36 pBluescript KS NotI/1.6 Kbp PN4.8 Cos36 pBluescript KS NotI/4.8 Kbp PBE5.3 Cos40 pBluescript KS BamHI/EcoRI/5.3 Kbp PN5.2 Cos50 pBluescript KS NotI/5.2 Kbp PN10.0 Cos50 pBluescript KS NotI/10.0 Kbp pS3.3 Cos50 pBluescript KS SacI/3.3 Kbp

B. Nucleotide Sequence of the Fα Biosynthetic Gene Cluster

1. Sequencing Strategy

The vast majority of the nucleotide sequence data was obtained by end-sequencing random, size selected sublibraries of cosmid DNA that were prepared as described herein. Random sublibraries were sequenced until sufficient coverage (8-10× redundancy) should have existed over the entire fragment of DNA. In order to obtain nucleotide sequence data for regions of the biosynthetic gene cluster that were underrepresented in the random sublibraries, or that for other reasons were difficult to sequence, two other sequencing strategies were used. In the first, products were generated using the polymerase chain reaction in such a way as to span the region of interest of the gene cluster. These PCR products were sequenced directly using the PCR primers as sequencing primers, or the products were cloned into the commercially available PCR product cloning vector pTOPO TA (Invitrogen, Carlsbad, Calif.), and sequenced using universal primers. Alternatively, sequencing primers were synthesized which facilitated obtaining nucleotide sequence by “walking” through regions of interest on cosmids or subclones prepared from the cosmids. Throughout, nucleotide sequence was obtained on Applied Biosystems Model 377 Automated sequencers, using ABI PRISM® BigDye™ Terminator Cycle Sequencing Ready Reaction reagents and materials according to detailed procedures provided by the manufacturer (Applied Biosystems, a Division of Perkin Elmer, Foster City, Calif.). Nucleotide sequence data was collected and analyzed using standard “Collection” and “Sequencing Analysis” algorithms (Applied Biosystems, a Division of Perkin Elmer, Foster City, Calif.). Nucleotide sequence assemblies were generated using the SeqMan™ II sequence analysis package that is commercially available from DNASTAR (Madison, Wis.), and using the custom Finch™-300 Assembly Server developed for us by Geospiza (Seattle, Wash.).

Two cosmids (designated Cos36 and Cos40) that appeared on the basis of extensive restriction mapping to span the majority of the Fα biosynthetic gene cluster were isolated from the wild-type S. cyaneogriseus cosmid library. These cosmids were sequenced in their entirety by end-sequencing random, size selected sublibraries that were prepared as described herein. In addition, random, size selected sublibraries prepared from the inserts in several subclones (as summarized in the following Table 3) were also sequenced. Finally, the majority of the subclones generated to support comprehensive restriction mapping of the Fα biosynthetic gene cluster were end-sequenced using universal primers.

TABLE 3 Subcloned Designation from Cosmid Restriction Sites/Size pNE57 Cos7 NotI-EcoRI/5.7 Kbp (S. cyaneogriseus strain 142) pNE57 Cos40 NotI-EcoRI/5.7 Kbp (wild-type S. cyaneogriseus) pB5.5 Cos14 BamHI/5.5 Kbp pN4.3 Cos14 NotI/4.3 Kbp pN10.0 Cos50 NotI/10.0 Kbp pS8.2 Cos14 SacI/8.2 Kbp

2. Construction of Sublibraries for Nucleotide Sequence Analysis

To generate large quantities of the inserts present in cosmids and in the subclones derived from those cosmids, large quantities of plasmid DNA were required. Media (typically 1 L) were inoculated with the clone of interest, and incubated at 37° C. overnight. Plasmid (cosmid) DNA was isolated from these cultures using materials and reagents included in the QIAGEN Plasmid Midi (up to 100 μg) and Maxi (up to 500 μg) kits, or reagents and materials included in the Nucleobond AX-100 (up to 100 μg) kit from Clontech (Palo Alto, Calif.). The inserts present in these plasmids (cosmids) were excised by digestion with appropriate restriction endonucleases, and the fragments were resolved by electrophoresis through 0.8% w/v agarose. The desired fragments were excised from these gels, and the DNA contained in those bands was isolated using reagents, materials, and procedures included in the QIAEX II® (for fragments larger than 10 Kbp) or QIAquick II (for fragments smaller than 10 Kbp) Gel Extraction Systems from QIAGEN (Valencia, Calif.). Then, the DNA was randomly sheared by sonication using a Microson cell disrupter at 10% output. Sonication times were optimized in order to generate fragments of the desired size (typically about 18 seconds for larger inserts isolated from cosmids, and about eight seconds for the smaller fragments isolated from plasmid subclones of those cosmids). Following ethanol precipitation, the DNA fragments were “blunted” using T4 DNA polymerase (New England Biolabs, Beverly, Mass.) in 25 μl reaction volumes containing 2.5 μl of 10×T4 DNA polymerase reaction buffer, 1 μl of 25 μg/ml BSA, and 1.5 μl of T4 DNA polymerase. The reaction mixtures were incubated at 16° C. for 20 minutes, and resolved by electrophoresis through 0.8% w/v agarose. The region of the gel containing DNA between 1.5 Kbp and 2.5 Kbp (by comparison to DNA fragments of known molecular weight) was excised, and the DNA was extracted from the agarose using reagents, materials, and procedures included in the QIAquick II Gel Extraction System from QIAGEN (Valencia, Calif.). Purified DNA was collected by ethanol precipitation and resuspended in 8 μl of water. These DNA fragments were then cloned into pCR®-Blunt, and the ligated products were transformed into chemically competent E. coli TOP10 using reagents, materials and procedures provided by the manufacturer (Invitrogen, Carlsbad, Calif.). Colonies were picked and used to inoculate 2 ml LB media supplemented with 50 μg/ml kanamycin, in 96-well deep well blocks. Plasmid DNA was purified from each of these cultures using reagents, materials and procedures included in QIAprep® 96 Turbo Miniprep Kits. Although the frequency of clones with insert generally exceeded 90%, each plasmid was digested with EcoRI and the fragments were resolved by electrophoresis through 0.8% w/v agarose in order to determine whether an insert of the desired size was present. Clones that did contain desired inserts were sequenced using universal sequencing primers as described herein.

3. Identification of Biosynthetic Modules and Domains within Modules

Many modular polyketide biosynthetic gene clusters have been characterized and manipulated. In addition, a large number of nucleotide sequences of modular polyketide biosynthetic gene clusters have been deposited in the public databases. In general, modules of modular polyketide biosynthetic gene clusters, and the domains within those modules can be identified by performing BLAST searches against the public databases, and extensive use of those public databases was made to facilitate the present analysis of the Fα biosynthetic gene cluster (see S. F. Altschul et al., Nucleic Acids Research 25:3389-3402 (1997)). In addition, use of a recent literature reference that summarizes methods for identification of modular polyketide synthase domains, that in particular, describes the differentiation of malonyl-class from methylmalonyl-class acyltransferase domains was employed (S. J. Kakavas et al., Journal of Bacteriology 179:7515-7522 (1997). Leadlay and colleagues originally described methods for differentiation of malonyl-class from methylmalonyl-class acyltransferase domains (see T. Schwecke et al., Proceedings of the National Academy of Sciences USA 92:7839-7843 (1995)).

A description of five open reading frames, which together encode the loading domain and the 13 modules of the polyketide synthase, is illustrated in the below Table 4. For each open reading frame, the position in the Fα biosynthetic gene cluster (in nucleotides) and the length (in amino acids) of the predicted gene product are shown. In addition, the approximate location of each biosynthetic domain within that predicted gene product (again in amino acids) is also displayed. Abbreviations used are as follows: ACP, acyl carrier protein; ATm, malonyl-class acyltransferase; ATmm, methylmalonyl-class acyltransferase; DH, dehydratase; ER, enoylreductase; KR, ketoreductase; KS, ketoacyl synthase; LD, loading domain; TE, thioesterase.

TABLE 4 ORF4: nt 12850-19875 (2341 aa) Designation: Loading Domain-Mod1 ATmm-LD aa 22-350 ACP-LD aa 365-450 KS-1 aa 473-897 ATmm-1 aa 1006-1339 DH-1 aa 1359-1547 KR-1 aa 1865-2052 ACP-1 aa 2137-2223 ORF5: nt 19865-31036 (3723 aa) Designation: Mod2-Mod3 KS-2 aa 34-466 ATmm-2 aa 574-908 KR-2 aa 1211-1391 ACP-2 aa 1473-1559 KS-3 aa 1578-2005 ATm-3 aa 2136-2476 DH-3 aa 2486-2667 ER-3 aa 2925-3279 KR-3 aa 3287-3466 ACP-3 aa 3556-3640 ORF6: nt 31115-49246 (6043 aa) Designation: Mod4-Mod7 KS-4 aa 34-456 ATm-4 aa 582-907 ACP-4 aa 950-1031 KS-5 aa 1055-1481 ATm-5 aa 1613-1938 KR-5 aa 2247-2427 ACP-5 aa 2516-2601 KS-6 aa 2621-3047 ATm-6 aa 3168-3493 KR-6 aa 3802-3983 ACP-6 aa 4078-4164 KS-7 aa 4189-4615 ATmm-7 aa 4727-5056 DH-7 aa 5078-5257 KR-7 aa 5588-5768 ACP-7 aa 5868-5952 ORF9: nt 52809-69833 (5674 aa) Designation: Mod8-Mod10 KS-8 aa 39-465 ATmm aa 574-904 DH-8 aa 926-1106 ER-8 aa 1366-1718 KR-8 aa 1726-1908 ACP-8 aa 1995-2080 KS-9 aa 2102-2529 ATm-9 aa 2661-2986 DH-9 aa 3009-3188 KR-9 aa 3492-3674 ACP-9 aa 3753-3842 KS-10 aa 3864-4290 ATmm-10 aa 4402-4732 DH-10 aa 4753-4928 KR-10 aa 5234-5416 ACP-10 aa 5499-5586 ORF10: nt 69929-85429 (5166 aa) Designation: Mod11-Mod13 KS-11 aa 34-456 ATm-11 aa 578-916 KR-11 aa 1199-1380 ACP-11 aa 1464-1549 KS-12 aa 1570-1996 ATmm-12 aa 2105-2442 KR-12 aa 2724-2906 ACP-12 aa 2992-3076 KS-13 aa 3096-3519 ATm-13 aa 3631-3975 DH-13 aa 4003-4188 KR-13 aa 4505-4687 ACP-13 aa 4780-4866 TE-13 aa 4893-5167

4. Identification of Other Biosynthetic Pathway Genes

Whether the other open reading frames that were found to be clustered with the core modular polyketide synthase genes played a role in Fα biosynthesis, and if so, what that role might be was based on a BLAST comparison of the nucleotide and predicted amino acid sequences of these open reading frames to sequences that have been deposited in the public databases cluster (see S. F. Altschul et al., Nucleic Acids Research 25:3389-3402 (1997)). Using those methods, a tentative identification of at least six other genes that could be involved in Fα biosynthesis was made.

A description of six additional open reading frames, which encode genes that could be involved in Fα biosynthesis, is illustrated in the below Table 5. For each open reading frame, the position in the Fα biosynthetic gene cluster (in nucleotides) and the length (in amino acids) of the predicted gene product are shown. In addition, a brief description of the BLAST results used to assign a putative functional role in Fα biosynthesis, is also included here for each of the open reading frames.

TABLE 5 ORFA: nt 382-2514 (711 aa) Designation: K⁺-Translocating ATPase, Subunit B (Not related to Fα Biosynthetic Gene Cluster) ORFB: nt 2511-4175 (555 aa) Designation: K⁺-Translocating ATPase, Subunit A (Not related to Fα Biosynthetic Gene Cluster) ORF1: nt 7697-10465 (922 aa) Designation: Regulatory Protein ORF2: nt 10791-11570 (259 aa) Designation: Thioesterase ORF3: nt 11659-12462 (267 aa) Designation: Reductase ORF7: nt 50449-51303 (284 aa) Designation: Methyltransferase ORF8: nt 51300-52706 (468 aa) Designation: p450 ORF11: nt 85574-86338 (254 aa) Designation: Oxidoreductase ORFX: nt 87037-88293 (419 aa) Designation: Endo-1,3-β-glucosidase (Not related to Fα Biosynthetic Gene Cluster)

ORFA and ORFB: BLAST results reveal considerable homology between ORFA and ORFB and K⁺-translocating ATPase subunits B and A, respectively, particularly the Mycobacterium tuberculosis genes (nucleotide sequences of which were directly submitted to the public databases). These genes are unrelated to the Fα biosynthetic gene cluster.

ORF1: BLAST results suggest that at the nucleotide level, ORF1 is related to a putative transcriptional activator in the pikCD operon of a macrolide biosynthetic gene cluster from S. venezuelae (see Y. Xue et al., Proceedings of the National Academy of Sciences USA 95:12111-12116 (1998)), and a putative regulatory protein in a Type-I polyketide synthase biosynthetic gene cluster from the rapamycin producing organism, S. hygroscopicus (see X. Ruan et al., Gene 203: 1-9 (1997)). At the predicted amino acid sequence level, the gene product exhibits limited homology to a family of hypothetical transcriptional activators related to the E. coli narL gene product. On the basis of these BLAST results, ORF1 appears to encode a transcriptional activator.

ORF2: BLAST results reveal significant homology between ORF2 and thioesterases at both the nucleotide and predicted amino acid sequence levels, including thioesterases in the Amycolatopsis mediterranei rifamycin biosynthetic gene cluster (see P. R. August et al., Chemistry & Biology 5:69-79 (1998)), and the S. griseus candicidin biosynthetic gene cluster (see L. M. Criado et al., Gene 126:135-139 (1993)). On the basis of these BLAST results, ORF2 appears to encode a thioesterase.

ORF3: An analysis of BLAST results suggests that ORF3 is homologous to reductases in the S. cyanogenus S136 landomycin biosynthetic gene cluster (see L. Westrich et al., FEMS Microbiological Letters 170:381-387 (1999)). At the predicted amino acid sequence level, BLAST results reveal homology between the ORF3 gene product and an oxidoreductase responsible for the conversion of versicolorin A to sterigmatocystin in the Aspergillus parasiticus aflatoxin biosynthetic pathway (see C. D. Skory et al., Applied and Environmental Microbiology 58:3527-3537 (1992)). On the basis of these BLAST results, ORF3 appears to encode a reductase.

ORF7: BLAST results reveal significant homology between ORF7 and methyltransferases at the nucleotide level, including methyltransferases in the S. lavendulae mitomycin C biosynthetic gene cluster (see Y. Q. Mao et al., Chemistry & Biology 6:251-263 (1999) and the Saccharopolyspora erythraea erythromycin biosynthetic gene cluster (see S. F. Haydock et al., Molecular and General Genetics 230:120-128 (1991)). On the basis of these BLAST results, ORF7 appears to encode a methyltransferase.

ORF8: BLAST results reveal limited homology between ORF8 and putative cytochrome P450's, including P450's in the S. roseofulvus frenolicin biosynthetic gene cluster and the S. pristinaespiralis pristinamycin biosynthetic gene cluster (see V. de Crecy-Lagard et al., Journal of Bacteriology 179:705-713 (1997)). At the predicted amino acid sequence level, ORF8 exhibits homology to a large family of mammalian cytochrome P450's. On the basis of these BLAST results, ORF8 appears to encode a cytochrome P450.

ORF11: BLAST results reveal significant homology between ORF11 and oxidoreductases at both the nucleotide and predicted amino acid sequence levels, including oxidoreductases in the S. violaceoruber granaticin biosynthetic gene cluster (D. H. Sherman et al., EMBO Journal 8:2717-2725, (1989)), and the S. cinnamonensis monensin biosynthetic gene cluster (see T. J. Arrowsmith et al., Molecular and General genetics 234:254-264 (1992)). On the basis of these BLAST results, ORF11 appears to encode an oxidoreductase.

ORFX: BLAST results reveal homology between ORFX and a glucan endo-1,3-β-glucosidase from Oerskovia xanthineolytica (see S. H. Shen et al., Journal of Biological Chemistry 266:1058-1063 (1991)). This gene is unrelated to the Fα biosynthetic gene cluster.

There are several open reading frames in the 3.5 Kbp region between characterized ORFB and ORF1, which on the basis of nucleotide sequence characteristics (G+C content, potential ribosome binding sites) appear to encode proteins. BLAST analysis, however, does not reveal significant homology between the predicted amino acid sequences of these hypothetical proteins and sequences of proteins that have been deposited in public databases. Consequently, ascribing a functional role to these hypothetical proteins in the biosynthesis of Fα is not possible on the basis of their nucleotide (or predicted amino acid) sequence alone. In addition, there are a number of open reading frames in the 7.8 Kbp region between characterized ORFX and the end of the nucleotide sequence that have now been obtained. Since ORFX encodes a gene that does not appear to play a role in Fα biosynthesis, and since macrolide biosynthetic genes are typically clustered, hypothetical proteins encoded by the open reading frames beyond ORFX do not participate in Fα biosynthesis.

Example 2 Gene Replacement, Characterization of Integrants and Excisants A. Gene Replacement

In order to develop an S. cyaneogriseus strain capable of direct fermentative production of 23-keto-Fα, generating derivatives of the Fα production strain in which the module 3 ketoreductase domain had been replaced with nonfunctional variants were sought. A series of directed amino acid substitutions, each designed to disrupt ketoreductase activity while minimally affecting the rest of the polyketide synthase were designed as follows. A multiple amino acid sequence alignment was generated in which the predicted amino acid sequence of the module 3 ketoreductase domain from the S. cyaneogriseus Fα biosynthetic gene cluster was aligned with the predicted amino acid sequences of a large number of biologically active ketoreductase domains. These ketoreductase domain sequences were from the S. avermitilis avermectin biosynthetic gene cluster, the Saccharopolyspora erythreae erythromycin biosynthetic gene cluster, the S. hygroscopicus rapamycin biosynthetic gene cluster, the S. caelestis niddamycin biosynthetic gene cluster, and the Amycolatopsis mediterranei rifamycin biosynthetic gene cluster. Three ketoreductase domains known to be nonfunctional (so-called “cryptic” ketoreductase domains from module 3 of the Saccharopolyspora erythreae erythromycin biosynthetic gene cluster, module 4 of the S. caelestis niddamycin biosynthetic gene cluster, and module 3 of the Amycolatopsis mediterranei rifamycin biosynthetic gene cluster) were also included in the sequence alignment. This multiple amino acid sequence alignment readily supported the identification of relatively invariant amino acid sequences common to the majority of biologically active ketoreductase domains, but absent from (or altered in) nonfunctional ketoreductase domains.

Methods were also developed for gene replacement in S. cyaneogriseus by homologous recombination such that the desired variants of the module 3 ketoreductase domain from the Fα biosynthetic gene cluster could be replaced with the engineered variants of the module 3 ketoreductase domain, as described herein.

1. Construction of Plasmids for Site-Directed Mutagenesis

The QuikChange™ site-directed mutagenesis procedure is a double-stranded method based on the polymerase chain reaction that requires two mutagenic oligonucleotides, one corresponding to each strand of the double stranded region of DNA. The method is less efficient when large plasmids, particularly large plasmids containing high G+C content DNA, are used. Consequently, site-directed mutagenesis of the Fα module 3 ketoreductase domain was performed in a vector designated pKR0.9 (see FIG. 3), which is the 900 bp BstEII-AatII fragment of pNE57 (and contains the desired region of the Fα module 3 ketoreductase domain), in the BstEII-AatII sites of pSL301 (Invitrogen, Carlsbad, Calif.).

2. Site-Directed Mutagenesis

Five variants of the Fα module 3 ketoreductase domain were generated by site-directed mutagenesis using reagents, materials and procedures provided by the manufacturer of the QuikChange™ Site-Directed Mutagenesis kit (Stratagene, La Jolla, Calif.). The following amino acid substitutions were generated in pKR0.9, using the mutagenic oligonucleotides indicated below:

“179” GGTGTLG (SEQ ID NO: 13) to GAASTLG (SEQ ID NO: 14) 5′-CTGGTGACGGGCGCTGCAAGCACTCTGGGGGCG (SEQ ID NO: 15) 3′-GACCACTGCCCGCGACGTTCGTGAGACCCCCGC (SEQ ID NO: 16) “204” LVSRRGM (SEQ ID NO: 17) to LVAAAGM (SEQ ID NO: 18) 5′-GCGGCATCTGCTGCTGGTGGCAGCGGCAGGCATGGCCGCCGCCGGTG (SEQ ID NO: 19) 3′-CGCCGTAGACGACGACCACCGTCGCCGTCCGTACCGGCGGCGGCCAC (SEQ ID NO: 20) “260” HTAGVLD (SEQ ID NO: 21) to HTPPLLD (SEQ ID NO: 22) 5′-GACCGCTGTGGTGCACACGCCACCTCTCCTGGACGACGCCACCGTG (SEQ ID NO: 23) 3′-CTGGCGACACCACGTGTGCGGTGGAGAGGACCTGCTGCGGTGGCAC (SEQ ID NO: 24) “283” GAKVD (SEQ ID NO: 25) to GAAVD (SEQ ID NO: 26) 5′-GATGCGGTGCTCGGGGCGGCTGTGGACGGTGCCCTGCAC (SEQ ID NO: 27) 3′-CTACGCCACGAGCCCCGCCGACACCTGCCACGGGACGTG (SEQ ID NO: 28) “306” VLFSSAA (SEQ ID NO: 29) to VLFAAAA (SEQ ID NO: 30) 5′-GTCGGCGTTCGTGCTGTTCGCAGCGGCCGCCGGGGTCCTGG (SEQ ID NO: 31) 3′-CAGCCGCAAGCACGACAAGCGTCGCCGGCGGCCCCAGGACC (SEQ ID NO: 32)

The QuickChange™ mutagenesis reactions contained 125 ng of each of the mutagenic oligonucleotides, 50 ng of pKR0.9, 0.7 μl of Pfu DNA polymerase, and 2.5% DMSO in final reaction volumes of 50 μl. The reactions were subjected to 22 cycles of amplification (95° C. for 45 seconds, 63° C. for 1 minute, and 70° C. for 10 minutes), and amplified products were cloned according to detailed procedures provided by the manufacturer. After completing the site-directed mutagenesis procedure, colonies were picked and used to inoculate 2 ml LB media supplemented with 100 μg/ml carbenicillin. Plasmid DNA was purified from each of these cultures using reagents, materials and procedures included in the QIAprep® 8 Turbo Miniprep Kits, and the mutated 900 bp BstEII-AatII region of the Fα module 3 ketoreductase domain was sequenced in its entirety in order to confirm that the desired changes had been made.

3. Construction of Plasmids for Integration

A three-way ligation was used to combine the five site-directed mutants of the Fα module 3 ketoreductase domain with flanking DNA to facilitate homologous integration using the pKC1132 backbone. The three components included: the 4.3 Kbp NotI-BstEII fragment of pNE57 (containing the majority of the Fα module 3 adjacent to the regions mutagenized); the 1.1 Kbp BstEII-PstI fragments of six pKR0.9 constructs (containing the five site-directed mutants of the Fα module 3 ketoreductase domain, and the wild-type Fα module 3 ketoreductase domain); and the 3.6 Kbp PstI-NotI fragment of pKC1132 (containing all of the elements necessary for selection and replication of the resultant plasmid in E. coli and Streptomyces). These manipulations resulted in the generation of the pFDmod3/5.2 plasmid series. These plasmids were then used to construct versions of the plasmids for integration from which approximately 1 Kbp of flanking DNA had been removed. These plasmids were constructed by digesting each of the pFDmod3/5.2 plasmids with EcoRI. This EcoRI site is immediately adjacent to the NotI site in pKC1132 that was used to introduce the 4.3 Kbp NotI-BstEII fragment of pNE57 (containing the majority of the Fα module 3). The 3′ overhang was filled in using T4 DNA polymerase under standard reaction conditions, and the linearized plasmids were digested with MscI. The digests were resolved by electrophoresis through 0.8% w/v agarose, the desired fragments were excised from the gel, and the DNA was extracted from the agarose using reagents, materials and procedures included in the QIAquick II Gel Extraction System from QIAGEN (Valencia, Calif.). Purified DNA was collected by ethanol precipitation and ligated to generate the pFDmod3/4.2 plasmid series (see FIG. 5).

Plasmids of the pFDmod3/5.2 series (see FIG. 4) and the pFDmod3/4.2 series (see FIG. 5) were transformed into E. coli ETS12567 (pUZ8002) using methods described herein. Then, these transformed E. coli strains were used as the source of DNA for conjugal transfer to S. cyaneogriseus using methods described herein.

4. Isolation and Analysis of Genomic DNA from S. cyaneogriseus Transconjugants and Excisants

A method modified from methods presented in D. A. Hopwood et al., Genetic Manipulation of Streptomyces, A Laboratory Manual, John Innes Foundation Press, Norwich, UK (1985) (“Isolation of Streptomyces “Total” DNA”: Procedure 4) was used for the isolation of small amounts of genomic DNA from S. cyaneogriseus strains. Putative S. cyaneogriseus transconjugants and excisants were picked and used to inoculate 3 ml KB3 medium (10 g/L Bacto-tryptone, 5 g/L yeast extract, 3 g/L beef extract, 1 g/L KH₂PO₄, 1 g/L K₂HPO₄, 1.5 g/L Difco agar, pH 6.8 and 0.5 ml/L of a trace metal solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O). The cultures were incubated at 31° C., with shaking at 220 rpm, for 24-28 hours. The cells in 500 μl aliquots of these cultures were collected by centrifugation in a microfuge at 13,000 rpm for 5 minutes, and the supernatant was discarded. After washing the cell pellets with water, they were suspended in 450 μl of SET (0.3 M sucrose, 25 mM EDTA, 25 mM Tris, pH 8.0, containing 4 mg/ml lysozyme and 50 μg/ml RNaseA), and the suspensions were incubated at 37° C. for 2-4 hours. 250 μl of a 2% solution of SDS was added, and the samples were vortexed for 1 minute. The samples were extracted with 250 μl of phenol:CHCl₃ (1:1) and the phases were resolved by centrifugation in a microfuge at 13,000 rpm for 5 minutes. The aqueous layer was removed to a new tube, and after adding 1/10^(th) volume 3 M sodium acetate, the DNA was precipitated by adding an equal volume of isopropanol. Precipitated DNA was collected by centrifugation in a microfuge at 13,000 rpm for 5 minutes, washed with −20° C. 70% ethanol, and suspended in 100 μl of water.

For the isolation of larger amounts of genomic DNA from S. cyaneogriseus strains, 25 ml KB3 medium (10 g/L Bacto-tryptone, 5 g/L yeast extract, 3 g/L beef extract, 1 g/L KH2PO₄, 1 g/L K₂HPO₄, 1.5 g/L Difco agar, pH 6.8 and 0.5 ml/L of a trace metal solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O) was inoculated with mycelial fragments of the strain of interest. The cultures were incubated at 31° C., with shaking at 220 rpm, for 24-28 hours. The cells in 3 ml aliquots of these cultures were collected by centrifugation in a microfuge at 13,000 rpm for 5 minutes, and the supernatant was discarded. After washing the cell pellets with water, genomic DNA was isolated using reagents, materials and procedures included in the DNAeasy™ system for the isolation of total (plant) DNA from QIAGEN (Valencia, Calif.).

5. Characterization of Transconjugants

Putative transconjugants were plated on CM agar (5 g/L corn steep liquor, 5 g/L Bacto-peptone, 10 g/L soluble starch, 0.5 g/L NaCl, 0.5 g/L CaCl₂.2H₂O, 20 g/L Bacto-agar) plates containing 100 μg/ml apramycin, 30 μg/ml nalidixic acid, 50 μg/ml cycloheximide, and 50 μg/ml nystatin A. These plates were incubated at 31° C. until the colonies were well-established. Genomic DNA was then isolated from the putative transconjugants using methods described herein, for analysis by Southern blot and nucleotide sequence analysis as follows. Aliquots of the genomic DNA preparations were digested with HindIII/StuI and with SalI. The fragments were resolved by electrophoresis through 0.8% w/v agarose, and blotted onto Nytran™ membranes (commercially available from Schleicher & Schuell BioScience, Inc. USA, Keene, N.H.) for Southern analysis according to well-established procedures similar to those described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). Typically, these Southern blots were probed with the mod3-specific probe, which was generated as described herein. The expected sizes of the fragments were:

Strain HindIII/StuI SalI S. cyaneogriseus production strain 10.8 Kbp 4.6 Kbp 142 S. cyaneogriseus production strain 13.3 Kbp 4.6 Kbp + 3.3 Kbp 142/pFDmod3/5.2 transconjugants S. cyaneogriseus production strain 12.3 Kbp 4.6 Kbp + 3.3 Kbp 142/pFDmod3/4.2 transconjugants

The region of interest of transconjugants that appeared to be correct on the basis of the Southern analysis was amplified using standard polymerase chain reaction (PCR), and the PCR products were sequenced to confirm that the desired sequence had been obtained. Two primer sets were used to characterize the transconjugants. Each pair was comprised of one mod3-specific primer, and one primer specific for vector-derived sequences. In addition, the primer pairs were designed such that one pair would amplify products from the “right side of the cassette” and the other pair would amplify products from the “left side of the cassette.” The primer pairs used were:

Left (mod70F) 5′-TACTGCGCCACACGGAGCCCGAG (SEQ ID NO:33) and (P6568B) 5′-TGGGTAACGCCAGGGTTTTC (SEQ ID NO:34) Right (PECOR1F) 5′-GGAAACAGCTATGACATGATTACG (SEQ ID NO:35) and (mod3633B) 5′-TCGGAGCCGCTCCACCTGAG (SEQ ID NO:36)

With genomic DNA isolated from a “correct” transconjugant as a template, these PCR primers would direct the amplification of 6.4 Kbp and 5.7 Kbp products, respectively. The region of these PCR products containing the ketoreductase domain were sequenced to confirm that the desired sequence had been obtained, using the following oligonucleotide sequencing primers:

“179” Transconjugants: Forward 5′-CCTGATGGACGCGGGTGCGC (SEQ ID NO: 37) Reverse 5′-GACACCGAAACCCCTG (SEQ ID NO: 38) “204” Transconjugants: Forward 5′-CCTGATGGACGCGGGTGCGC (SEQ ID NO: 39) Reverse 5′-GCCGTGTGCACCACAGCGGTCAG (SEQ ID NO: 40) “260”, “283”, “306“ Transconjugants: Forward 5′-GTGTGATGTCGCCGACCGCGCCCAGGTC (SEQ ID NO: 41) Reverse 5′-GCGCTGGTGGGCCAGGGCGTCC (SEQ ID NO: 42)

6. Excision and Characterization of Excisants

Transconjugants that had been verified by Southern analysis and by nucleotide sequence analysis of PCR products as described herein were used to inoculate 25 ml of KB3 medium (10 g/L Bacto-tryptone, 5 g/L yeast extract, 3 g/L beef extract, 1 g/L KH₂PO₄, 1 g/L K₂HPO₄, 1.5 g/L Difco agar, pH 6.8 and 0.5 ml/L of a trace metal solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O), and the cultures were incubated at 31° C. with shaking at 220 rpm, for 48 hours. A 500 μl aliquot of the culture was crossed into a fresh 25 ml of KB3 medium, and incubation was continued at 31° C. with shaking at 220 rpm, for an additional 48 hours. This process was continued for many such rounds, in the absence of selection, in order to allow for the excision event to occur. After rounds 3-6, serial dilutions of the cultures were prepared from 10⁻¹ to 10⁻⁵, and 250 μl aliquots of the 10⁻³ to 10⁻⁵ dilutions were plated onto 140 mm diameter CM agar plates (5 g/L corn steep liquor, 5 g/L Bacto-peptone, 10 g/L soluble starch, 0.5 g/L NaCl, 0.5 g/L CaCl₂.2H₂O, 20 g/L Bacto-agar). These plates were incubated at 31° C. for 48-96 hours, until colonies were well-established. Individual colonies were then picked, and patched in replicate onto CM plates, and CM plates supplemented with 100 mg/ml apramycin. These plates were incubated at 31° C. for up to 5 days, at which time colonies sensitive to apramycin, but capable of growing normally in the absence of selection were identified. Genomic DNA was then isolated from these putative excisants using methods described herein. Using these genomic DNA preparations as templates, the region of interest was amplified using the polymerase chain reaction (PCR), and the PCR products were sequenced to confirm that the desired sequence had been obtained. The primer pair used for amplification was:

(mod70F) 5′-TACTGCGCCACACGGAGCCCGAG (SEQ ID NO: 33) and (mod3633B) 5′-TCGGAGCCGCTCCACCTGAG (SEQ ID NO: 36)

With genomic DNA isolated from a “correct” excisant as a template, these PCR primers would direct the amplification of a 6.6 Kbp product. The region of these PCR products containing the ketoreductase domain were sequenced herein to confirm that the desired sequence had been obtained, using the following oligonucleotide sequencing primers:

“179” Excisants: Forward 5′-CCTGATGGACGCGGGTGCGC (SEQ ID NO: 37) Reverse 5′-GACACCGAAACCCCTG (SEQ ID NO: 38) “204” Excisants: Forward 5′-CCTGATGGACGCGGGTGCGC (SEQ ID NO: 39) Reverse 5′-GCCGTGTGCACCACAGCGGTCAG (SEQ ID NO: 40) “260”, “283”, “306” Excisants: Forward 5′-GTGTGATGTCGCCGACCGCGCCCAGGTC (SEQ ID NO: 41) Reverse 5′-GCGCTGGTGGGCCAGGGCGTCC (SEQ ID NO: 42)

B. Fermentation and Analysis of Fermentation Products

Seed flasks containing 25 ml of KB3 medium (10 g/L Bacto-tryptone, 5 g/L yeast extract, 3 g/L beef extract, 1 g/L KH₂PO₄, 1 g/L K₂HPO₄, 1.5 g/L Difco agar, pH 6.8 and 0.5 ml/L of a trace metal solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O) were inoculated with 500 μl of a suspension of S. cyaneogriseus mycelial fragments (either fresh or frozen) and the cultures were incubated at 31° C. with shaking at 220 rpm, for 48 hours. A 500 μl aliquot of the seed culture was crossed into production flasks containing 25 ml of SD2 production medium (85.5 g/L glucose, 0.36 g/L KCl, 0.72 g/L MgSO₄.7H₂O, 7.2 g/L Ca CO₃, 4.86 g/L (NH₄)₂SO₄, 0.72 g/L K₂HPO₄, 7.2 g/L pharmamedia, and 1.8 ml/L of a trace metal solution containing 30 g/L FeSO₄, 30 g/L ZnSO₄.7H₂O, 4 g/L MnSO₄, 4 g/L CuCl₂.5H₂O, 0.4 g/L CoCl₂.6H₂O) and the cultures were incubated at 31° C. for 10 days. Starting at (typically) 120 hours, and continuing through the end of the fermentation, 100 μl aliquots of the production culture were removed, and combined with 900 μl of methanol. The suspensions were vortexed for 1 minute, clarified by centrifugation in a microfuge at 13,000 rpm for 10 minutes, and 10 μl aliquots of the extract were analyzed by reversed phase HPLC.

For analysis by reversed phase HPLC, samples were subjected to chromatography on a Waters Model 625 Liquid Chromatography Station equipped with a Waters Model 996 Photodiode Array Detector, a Waters Model 717 Autosampler, and a Waters Nova-Pak C₁₈ column (8 mm×100 mm). The column was equilibrated in and eluted with a mobile phase containing 60% (v/v) acetonitrile and 40% (v/v) 100 mM ammonium acetate, pH 4.5 at a flow rate of 2 ml/min. The compounds of interest, Fα and 23-keto Fα (predecessor of moxidectin), were detected by monitoring their absorbance at 242 nm, and retention times were compared to those of authentic samples.

In the foregoing, there has been provided a detailed description of particular embodiments of the present invention for the purpose of illustration and not limitation. It is to be understood that all other modifications, ramifications and equivalents obvious to those having skill in the art based on this disclosure are intended to be included within the scope of the invention as claimed. 

1. A purified and isolated nucleic acid molecule encoding at least one protein of the biosynthetic pathway for producing an LL-F28249 compound, wherein said nucleic acid molecule is isolated from an antibiotic-producing wild-type or mutant Streptomyces.
 2. The nucleic acid molecule according to claim 1, wherein the nucleic acid molecule is isolated from an antibiotic-producing wild-type or mutant Streptomyces cyaneogriseus subsp. noncyanogenus.
 3. The nucleic acid molecule according to claim 1, wherein the LL-F28249 compound is LL-F28249α.
 4. The nucleic acid molecule according to claim 1, wherein the molecule has the nucleotide sequence set forth in SEQ ID NO:1 or its complementary strand.
 5. A nucleic acid sequence which hybridizes to the sequence of the nucleic acid molecule of claim 4 and encodes a protein of the biosynthetic pathway for producing an LL-F28249 compound.
 6. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 7697-10465 of SEQ ID NO:1.
 7. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 10791-11570 of SEQ ID NO:1.
 8. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 11659-12462 of SEQ ID NO:1.
 9. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 12850-19875 of SEQ ID NO:1.
 10. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 19865-31036 of SEQ ID NO:1.
 11. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 31115-49246 of SEQ ID NO:1.
 12. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 50449-51303 of SEQ ID NO:1.
 13. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 51300-52706 of SEQ ID NO:1.
 14. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 52809-69833 of SEQ ID NO:1.
 15. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 69929-85429 of SEQ ID NO:1.
 16. The nucleic acid molecule of claim 4, wherein the molecule comprises nucleotides 85574-86338 of SEQ ID NO:1.
 17. A biologically functional plasmid or vector containing the nucleic acid molecule according to claim
 1. 18. The plasmid or vector according to claim 17, wherein the plasmid or vector comprises Cos11 having ATCC Designation Number PTA-4392, Cos36 having ATCC Designation Number PTA-4393 or Cos40 having ATCC Designation Number PTA-4394.
 19. A suitable host cell stably transformed or transfected by the plasmid or vector according to claim
 17. 20. The host cell according to claim 19, wherein the host is Escherichia, Actinomycetales, Bacillus, Corynebacteria or Thermoactinomyces.
 21. The host cell according to claim 20, wherein the host is Escherichia coli, Streptomyces lividans, Streptomyces coelicolor, Streptomyces griseofuscus or Streptomyces ambofaciens.
 22. A biosynthesis protein encoded by the nucleic acid molecule according to claim
 1. 23. The protein according to claim 22, wherein the amino acid sequence is set forth in any one of SEQ ID NO:2 to SEQ ID NO:12, or a biologically active variant thereof.
 24. A process for the production of a protein involved in the biosynthesis of an LL-F28249 compound, said process comprising: growing, under suitable nutrient conditions, a prokaryotic or eukaryotic host cell transformed or transfected with a nucleic acid molecule according to claim 1 in a manner allowing expression of the protein product, and isolating the desired protein product of the expression of the nucleic acid molecule.
 25. A protein product of the expression of the nucleic acid molecule in a prokaryotic or eukaryotic host cell according to claim
 24. 26. A plasmid or a combination of two or three plasmids for cloning the nucleic acid molecule which encodes the proteins of the biosynthetic pathway of an LL-F28249 compound, wherein said plasmid or combination contains the nucleic acid molecule that spans the entire biosynthetic gene cluster and encodes type I polyketide synthase that is responsible for producing the LL-F28249 compound.
 27. The combination according to claim 26, which comprises Cos11 having ATCC Designation Number PTA-4392, Cos36 having ATCC Designation Number PTA-4393 and Cos40 having ATCC Designation Number PTA-4394. 28-35. (canceled) 